Goto

Collaborating Authors

 scsg method


Non-convex Finite-Sum Optimization Via SCSG Methods

Neural Information Processing Systems

We develop a class of algorithms, as variants of the stochastically controlled stochastic gradient (SCSG) methods, for the smooth nonconvex finite-sum optimization problem. Only assuming the smoothness of each component, the complexity of SCSG to reach a stationary point with $E \|\nabla f(x)\|^{2}\le \epsilon$ is $O(\min\{\epsilon^{-5/3}, \epsilon^{-1}n^{2/3}\})$, which strictly outperforms the stochastic gradient descent. Moreover, SCSG is never worse than the state-of-the-art methods based on variance reduction and it significantly outperforms them when the target accuracy is low. A similar acceleration is also achieved when the functions satisfy the Polyak-Lojasiewicz condition. Empirical experiments demonstrate that SCSG outperforms stochastic gradient methods on training multi-layers neural networks in terms of both training and validation loss.


Reviews: Non-convex Finite-Sum Optimization Via SCSG Methods

Neural Information Processing Systems

Summary: This paper proposes a variant of a family of stochastic optimization algorithms SCSG (closely related to SVRG), and analyzes it in the context of non-convex optimization. The main difference is that the outer loop computes a gradient on a random subset of the data, and the stochastic gradients in the inner loop have a random cardinality and are not restricted to those participating in the outer gradient. The analysis of the stochastic gradients despite this mismatch is the main technical contribution of the paper. The result is a set of new bounds on the performance of the algorithm which improve on both sgd and variance reduced algorithms, at least in the low-medium accuracy regimes. Experimental evidence supports the claims of improvement in objective reduction (both training error and test error) per iteration both before the end of the first pass on data and after, both by SCSG over SGD and by varying batch size variant over the fixed size variant, but only on two networks applied to the venerable and small MNIST dataset. Pros of acceptance: - The claimed results are interesting and shed light on an important regime.


Non-convex Finite-Sum Optimization Via SCSG Methods

Neural Information Processing Systems

We develop a class of algorithms, as variants of the stochastically controlled stochastic gradient (SCSG) methods, for the smooth nonconvex finite-sum optimization problem. Only assuming the smoothness of each component, the complexity of SCSG to reach a stationary point with $E \ abla f(x)\ {2}\le \epsilon$ is $O(\min\{\epsilon {-5/3}, \epsilon {-1}n {2/3}\})$, which strictly outperforms the stochastic gradient descent. Moreover, SCSG is never worse than the state-of-the-art methods based on variance reduction and it significantly outperforms them when the target accuracy is low. A similar acceleration is also achieved when the functions satisfy the Polyak-Lojasiewicz condition. Empirical experiments demonstrate that SCSG outperforms stochastic gradient methods on training multi-layers neural networks in terms of both training and validation loss.